26 research outputs found
Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm
Reverberation, which is generally caused by sound reflections from walls,
ceilings, and floors, can result in severe performance degradation of acoustic
applications. Due to a complicated combination of attenuation and time-delay
effects, the reverberation property is difficult to characterize, and it
remains a challenging task to effectively retrieve the anechoic speech signals
from reverberation ones. In the present study, we proposed a novel integrated
deep and ensemble learning algorithm (IDEA) for speech dereverberation. The
IDEA consists of offline and online phases. In the offline phase, we train
multiple dereverberation models, each aiming to precisely dereverb speech
signals in a particular acoustic environment; then a unified fusion function is
estimated that aims to integrate the information of multiple dereverberation
models. In the online phase, an input utterance is first processed by each of
the dereverberation models. The outputs of all models are integrated
accordingly to generate the final anechoic signal. We evaluated the IDEA on
designed acoustic environments, including both matched and mismatched
conditions of the training and testing data. Experimental results confirm that
the proposed IDEA outperforms single deep-neural-network-based dereverberation
model with the same model architecture and training data
IANS: Intelligibility-aware Null-steering Beamforming for Dual-Microphone Arrays
Beamforming techniques are popular in speech-related applications due to
their effective spatial filtering capabilities. Nonetheless, conventional
beamforming techniques generally depend heavily on either the target's
direction-of-arrival (DOA), relative transfer function (RTF) or covariance
matrix. This paper presents a new approach, the intelligibility-aware
null-steering (IANS) beamforming framework, which uses the STOI-Net
intelligibility prediction model to improve speech intelligibility without
prior knowledge of the speech signal parameters mentioned earlier. The IANS
framework combines a null-steering beamformer (NSBF) to generate a set of
beamformed outputs, and STOI-Net, to determine the optimal result. Experimental
results indicate that IANS can produce intelligibility-enhanced signals using a
small dual-microphone array. The results are comparable to those obtained by
null-steering beamformers with given knowledge of DOAs.Comment: Preprint submitted to IEEE MLSP 202
Time-Domain Multi-modal Bone/air Conducted Speech Enhancement
Previous studies have proven that integrating video signals, as a
complementary modality, can facilitate improved performance for speech
enhancement (SE). However, video clips usually contain large amounts of data
and pose a high cost in terms of computational resources and thus may
complicate the SE system. As an alternative source, a bone-conducted speech
signal has a moderate data size while manifesting speech-phoneme structures,
and thus complements its air-conducted counterpart. In this study, we propose a
novel multi-modal SE structure in the time domain that leverages bone- and
air-conducted signals. In addition, we examine two ensemble-learning-based
strategies, early fusion (EF) and late fusion (LF), to integrate the two types
of speech signals, and adopt a deep learning-based fully convolutional network
to conduct the enhancement. The experiment results on the Mandarin corpus
indicate that this newly presented multi-modal (integrating bone- and
air-conducted signals) SE structure significantly outperforms the single-source
SE counterparts (with a bone- or air-conducted signal only) in various speech
evaluation metrics. In addition, the adoption of an LF strategy other than an
EF in this novel SE multi-modal structure achieves better results.Comment: multi-modal, bone/air-conducted signals, speech enhancement, fully
convolutional networ